Abstract
Unsupervised domain adaptation techniques are essential for image classification tasks in the real world. As the domain of images, or the space of all possible images, is so enormous that models trained on any dataset will inevitably suffer from out of domain issues. One promising research direction is to use domain adaptation methods to adapt models trained on source domain to the target domain. Adversarial Discriminative Domain Adaptation (ADDA) is one typical adversarial learning based unsupervised domain adaptation method. Though it is proved to be effective on simple and small datasets, it requires sophisticated training strateies and is hard to converge at times. We propose to force align the distribution of the model’s output with that of an adapted model, which also serves as the initialization for the adversarial training. In this way, the adversarial process will be forced to search within a space with results at least as good as the initialization. Experiments on our proposed Tiny-16-Class-Imagenet show our method is effective and efficient in terms of accuracies and training time.
Introduction
Background
By generalizability, we refer to the model’s ability to perform equally well on unseen data. The word, “domain”, in this article denotes the space of input features and the marginal distribution . Specifically, for image classification tasks, the domain of training dataset is the set of all possible images and the marginal distribution in this dataset 6. It is crucuial for models to be generalizable doing image classification tasks as the space of possible images is too big that any dataset can only capture one small fraction of it and if the model fails to generalize, then it is useless. Domain shift refers two domains being different, which is common. For example, when using a model trained with images taken in daylight, but used with images taken at night. Unsurprisingly, the model usually fails. Different patterns of perturbations like noises imposed on images are another souce of domain shift. To solve the problem of domain shift, one promising research area is domain adaptation, which aims to adapt a model trained on source domain to the target domain. In this project, we investigate the unsupervised domain adaptation problem, which does not require the target domain to be labeled.
Related Work
Extensive domain adaptation algorithms have been proposed to account for the degradation in performance due to domain shift. Deep Coral 4 extends the unsupervised domain adaption method Coral to learn a nonlinear transformation that is able to align correlations of layer activations in deep neural networks. Adversarial Discriminative Domain Adaptation (ADDA) 5 combines discriminative model and generative adversarial networks to learn a discriminative mapping by fooling a domain discriminator.
Method
Datasets
Figure 2: Sample noises in the Tiny-16-Class-ImageNet dataset. Top row from left to right: No noise, uniform noise, salt-and-pepper noise. Bottom row from left to right: rotation, high-pass, low-pass. Image manipulations follow the procedure in 1.
We conduct experiments on two datasets: Tiny-16-Class-ImageNet and MNIST-USPS23. Most experiments are done on the Tiny-16-Class- ImageNet, which is self-produced following guidelines in 1. The Tiny-16-Class-ImageNet has three subsets: training set, validation set and test set, each containing 10015, 1269 and 10350 images respectively. All three subsets have 16 general classes (like bear rather than brown bear), but with different domains. Training and validation sets contain samples of different sub-classes (brown bear vs black bear). We apply different patterns of noises to generate different domains. Sample noises are illustrated in figure 2. Test set contains all samples from every sub-classes (brown bear, black bear, etc). We have also tested our proposed method on MNIST-USPS dataset.
Deep Coral
We adapt the idea of Deep Coral 4 to simply align second-order statistics in the last layer of the backbone network by adding a coral loss. This method is simple yet effective and is very extensible. We replace the backbone of the Deep Coral with ResNet-50 pretrained on ImageNet when doing experiments on the Tiny-16-Class-ImageNet. We use the same SGD hyper-parametsers as in 4 The controlling the weight of the coral loss is set the same with 4, except on MNIST-USPS dataset, where we set .
ADDA
We also adopt the idea of ADDA by first learning a discriminative representation using data from the source domain and then learning another encoding that maps the target domain to the source domain with a domain-adversarial loss. We use ResNet-50 (excluding the last layer) as the backbone for encoder and a layer MLP as the discriminator with hidden size of . The pretrained ResNet-50 will be freezed during adversarial training. Adam is used as the optimizer with and . The learning rate is set to be and the batch size is . During the adaption stage, target encoder is updated every steps.
ADDA-CORAL
We propose a new method that combines the Deep Coral and the ADDA methods, by using Deep Coral as the pretraining of the ADDA, and aligning the target domains’ second order statistics between the classification outputs of the fixed pretrained encoder and the ADDA trained target encoder. The overall architecture is illustrated in Fig.1. During experiments, we find that vanilla ADDA ruins the pretrained encoder due to the poorly trained discriminator. To better use the initialization of the Deep Coral pretrained encoder while ensuring the target encoder learned will generate similar features for target and source domain, we use coral loss to only align the ADDA trained encoder’s classification output with that of the fixed pretrained encoder, and gradually decrease the coral loss’s weight.
The underlying assumption we made here is that we assume the best possible solution lies near (with respect to learning using Adam) to the already good initialization in the solution space.
Results
Table 1: Our Deep Coral+ADDA’s results on Tiny-16-Class-ImageNet and MINIST-USPS.
Setting |
Source |
target |
Acc |
---|---|---|---|
ResNet-50 | train | val | |
ADDA | train | val | |
Deep Coral | train | val | |
Ours | train | val | |
LeNet | MINST | USPS | |
ADDA | MINST | USPS | |
Deep Coral | MINST | USPS | |
Ours | MINST | USPS |
: validation set with uniform noise (0.5)
Table 2: Our Deep Coral+ADDA’s results on unseen test set of Tiny-16-Class-ImageNet.
Setting |
Train Source |
Train target |
Unseen Target | Acc |
---|---|---|---|---|
ResNet-50 | train | None | Test | |
ResNet-50-ImageNet | train | None | Test | |
DeepCoral | train | val | Test | |
Ours | train | val | Test |
: validation set with uniform noise (0.5)
Figrue 3: Classification accuracy in percent for different domains. Model is only trained on the source domain. Models to are adapted on one target domain (in red rectangle) via ADDA. to are similar except with Deep Coral. Best results for each domain and method are bold in blue.
Discussion
Experiment results in Figure 3 shows ADDA and Deep Coral’s improvements on the target domain. Deep Coral generally outperform ADDA by a large margin except on the High-Pass target domain. The failure on this domain is mostly likely due to the drastic domain shift between High-Pass and others, as illustrated in Figure 2 in the dataset section. Deep Coral has better generalizability to unseen domains. It’s most likely because Deep Coral doesn’t alter the encoder much and the encoder is pretrained on the ImageNet (though without any added noises).
Table 1 shows our proposed Deep Coral+ADDA’s results on the Tiny-16-Class-ImageNet and MNIST-USPS. We added uniform noise (0.5) to the validation set making the domain shift to the training set even larger and the domain adaptation task even harder. The high performance and concrete improvements of our Deep Coral+ADDA method over other settings validate the effectiveness of our novel modifications and designs. We also test our method on unseen and untrained target domain and observe a significantly better results as shown in Table 2 in the appendix.
-
Robert Geirhos, Carlos R Medina Temme, Jonas Rauber, Heiko H Schutt, Matthias Bethge, and Felix A Wichmann. Generalisation in humans and deep neural networks. arXiv preprint arXiv:1808.08750, 2018.
↩ -
Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.
↩ -
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, BoWu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. 2011.
↩ -
Baochen Sun and Kate Saenko. Deep coral: Correlation alignment for deep domain adaptation. In European conference on computer vision, pages 443-450. Springer, 2016.
↩ -
Eric Tzeng, Judy Homan, Kate Saenko, and Trevor Darrell. Adversarial discriminative domain adaptation. In Proceedings of the IEEE con- ference on computer vision and pattern recogni- tion, pages 7167-7176, 2017.
↩ -
Wang, Mei, and Weihong Deng. “Deep visual domain adaptation: A survey.” Neurocomputing 312 (2018): 135-153.
↩